Estimating the unseen: A sublinear-sample canonical estimator of distributions
نویسندگان
چکیده
We introduce a new approach to characterizing the unobserved portion of a distribution, whichprovides sublinear-sample additive estimators for a class of properties that includes entropy anddistribution support size. Together with the lower bounds proven in the companion paper [29],this settles the longstanding question of the sample complexities of these estimation problems (upto constant factors). Our algorithm estimates these properties up to an arbitrarily small additiveconstant, using O(n/ log n) samples; [29] shows that no algorithm on o(n/ log n) samples canachieve this (where n is a bound on the support size, or in the case of estimating the supportsize, 1/n is a lower bound the probability of any element of the domain). Previously, no explicitsublinear-sample algorithms for either of these problems were known.Additionally, our algorithm runs in time linear in the number of samples used. Think not, because no man sees,Such things will remain unseen.–Henry Wadsworth Longellow, from “The Builders”.
منابع مشابه
Estimating the Unseen: Improved Estimators for Entropy and other Properties
Recently, Valiant and Valiant [1, 2] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over at most n distinct elem...
متن کاملMinimax Estimator of a Lower Bounded Parameter of a Discrete Distribution under a Squared Log Error Loss Function
The problem of estimating the parameter ?, when it is restricted to an interval of the form , in a class of discrete distributions, including Binomial Negative Binomial discrete Weibull and etc., is considered. We give necessary and sufficient conditions for which the Bayes estimator of with respect to a two points boundary supported prior is minimax under squared log error loss function....
متن کاملEstimating a Bounded Normal Mean Relative to Squared Error Loss Function
Let be a random sample from a normal distribution with unknown mean and known variance The usual estimator of the mean, i.e., sample mean is the maximum likelihood estimator which under squared error loss function is minimax and admissible estimator. In many practical situations, is known in advance to lie in an interval, say for some In this case, the maximum likelihood estimator...
متن کاملINSPECTRE: Privately Estimating the Unseen
We develop differentially private methods for estimating various distributional properties. Given a sample from a discrete distribution p, some functional f , and accuracy and privacy parameters α and ε, the goal is to estimate f(p) up to accuracy α, while maintaining ε-differential privacy of the sample. We prove almost-tight bounds on the sample size required for this problem for several func...
متن کاملComparison of Small Area Estimation Methods for Estimating Unemployment Rate
Extended Abstract. In recent years, needs for small area estimations have been greatly increased for large surveys particularly household surveys in Sta­ tistical Centre of Iran (SCI), because of the costs and respondent burden. The lack of suitable auxiliary variables between two decennial housing and popula­ tion census is a challenge for SCI in using these methods. In general, the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Electronic Colloquium on Computational Complexity (ECCC)
دوره 17 شماره
صفحات -
تاریخ انتشار 2010